Modify list_* methods in catalogs to return Iterators#2172
Modify list_* methods in catalogs to return Iterators#2172jayceslesar wants to merge 9 commits intoapache:mainfrom
list_* methods in catalogs to return Iterators#2172Conversation
rambleraptor
left a comment
There was a problem hiding this comment.
This looks awesome! Thanks for doing it. I've got one code question and then a versioning question:
Do we need to wait for 1.0 to change these APIs? I'm hoping the answer will be no.
pyiceberg/catalog/sql.py
Outdated
| if tables := list(self.list_tables(namespace)): | ||
| raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.") |
There was a problem hiding this comment.
When we (eventually) handle pagination tokens, I believe this will cause the process to consume the entire paginated list. If it's a long list, that'll require a lot of server calls to fetch all of the tables.
If that's true, what if we did this instead?
| if tables := list(self.list_tables(namespace)): | |
| raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.") | |
| tables = self.list_tables(namespace) | |
| try: | |
| next(tables) | |
| raise NamespaceNotEmptyError(f"Namespace {namespace_str} is not empty. {len(list(tables))} tables exist.") | |
| except StopIteration: | |
| pass | |
There was a problem hiding this comment.
I see, maybe it makes the most sense to make a little helper function here for if a namespace is empty or not? can do that in a different MR but i dont know if thats an official method upstream https://github.com/search?q=repo%3Aapache%2Ficeberg%20namespacenotempty&type=code
There was a problem hiding this comment.
yeah this makes sense, was thinking could consolidate logic across catalogs but not sure if that makes sense given the implementation differences
|
Just brought this up to date if you want to give another look! I am going to take a look at your existing comment later! |
|
Still need to do some work here -- brought in a lot of those formatting changes and havent looked in a while hahahaha |
| # Hierarchical namespace is not supported. Return an empty list | ||
| if namespace: | ||
| return [] | ||
| return iter([]) |
There was a problem hiding this comment.
I think it would be best not to use iter when not needed. In fact, we only need it for the list-tables of the REST Catalog. Does mypy allow to use list here? Same for line 748
There was a problem hiding this comment.
mypy says
pyiceberg/catalog/hive.py:746: error: Incompatible return value type (got "list[Never]", expected "Iterator[tuple[str, ...]]") [return-value]
pyiceberg/catalog/hive.py:746: note: "list" is missing following "Iterator" protocol member:
pyiceberg/catalog/hive.py:746: note: __next__
Found 1 error in 1 file (checked 166 source files)
There was a problem hiding this comment.
I see. The reason I ask is because iter doesn't allow for len:
Python 3.14.1 (main, Dec 2 2025, 12:51:37) [Clang 17.0.0 (clang-1700.4.4.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> len(iter([1,2,3]))
Traceback (most recent call last):
File "<python-input-0>", line 1, in <module>
len(iter([1,2,3]))
~~~^^^^^^^^^^^^^^^
TypeError: object of type 'list_iterator' has no len()
>>> len([1,2,3])
3
|
I feel like we should hold off on merging this, and additionally hold off on a change I want to make for the inspect tables to also be lazy until we are ready for a 1.0.0? Wdyt? I don't think it's wise to just release a breaking change on this interface without some form of warning. Happy to keep this periodically up to date if we want to go that route |
Rationale for this change
We want to support pagination and we need to be able to return iterators to do so.
See #2158 for more context
Are these changes tested?
Existing tests were modified
Are there any user-facing changes?
yes, this is user facing and breaking but overall an improvement.